Skip to content

Conversation

@k82cn
Copy link
Contributor

@k82cn k82cn commented Jan 23, 2026

No description provided.

Signed-off-by: Klaus Ma <klausm@nvidia.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @k82cn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, comprehensive example showcasing a Parameter Server architecture for distributed machine learning, built upon the flamepy.rl.Runner framework. Concurrently, it refines the core application configuration by making the working_directory field optional, providing greater flexibility in how applications specify their execution environment. Additionally, the Python SDK's package management has been made more robust, ensuring reliable setup for Python-based applications.

Highlights

  • Refactored working_directory to be Optional: The working_directory field in Application and ApplicationAttributes structs in common/src/apis.rs has been changed from a mandatory String to an Option<String>, allowing applications to specify an optional working directory.
  • Updated Default Working Directory Handling: The default value for working_directory in ApplicationAttributes is now None, and the application execution logic in executor_manager/src/shims/host_shim.rs has been updated to use an executor-specific directory if no working directory is provided.
  • Introduced Parameter Server Example: A new example (examples/ps/) has been added, demonstrating a synchronous Parameter Server training setup using flamepy.rl.Runner with PyTorch for an MNIST-like task.
  • Enhanced Python Package Installation: The Python SDK's _extract_archive and _install_package_from_url methods in sdk/python/src/flamepy/rl/runpy.py have been improved to ensure clean extraction by removing old directories and to use pip install --upgrade for more robust package updates, along with added debug logging.
  • Updated Test Cases and Data Mappings: Various test cases and data mapping implementations across session_manager and common crates have been adjusted to align with the new Option<String> type for working_directory.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a parameter server example, which is a great addition. It also refactors the working_directory attribute for applications to be optional, providing a more sensible default when it's not specified. Additionally, there are some nice improvements to the Python SDK's package installation logic.

My review includes a few suggestions to improve the new example code's robustness, such as handling file paths correctly and ensuring variables are initialized before use. I also found a small improvement for file handling in the Python SDK. Overall, the changes are well-done.

accuracy = evaluate(model, test_loader)
print("Iter {}: \taccuracy is {:.1f}".format(i, accuracy))

print("Final accuracy is {:.1f}.".format(accuracy))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The accuracy variable is only defined inside the if i % 10 == 0: block within the loop. If the loop range were smaller (e.g., range(0)), this line would raise an UnboundLocalError. It would be safer to initialize accuracy to a default value (e.g., 0.0) before the loop.

Comment on lines +19 to +31
with FileLock(os.path.expanduser("~/data.lock")):
train_loader = torch.utils.data.DataLoader(
datasets.MNIST(
"~/data", train=True, download=True, transform=mnist_transforms
),
batch_size=128,
shuffle=True,
)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST("~/data", train=False, transform=mnist_transforms),
batch_size=128,
shuffle=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The path ~/data is not guaranteed to be expanded by torchvision.datasets.MNIST. It's better to use os.path.expanduser explicitly to ensure the path is resolved correctly. Also, it's good practice to place the lock file within the directory it's protecting to keep things organized. I've refactored this part to define data_dir once and use it for both the dataset and the lock file.

Suggested change
with FileLock(os.path.expanduser("~/data.lock")):
train_loader = torch.utils.data.DataLoader(
datasets.MNIST(
"~/data", train=True, download=True, transform=mnist_transforms
),
batch_size=128,
shuffle=True,
)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST("~/data", train=False, transform=mnist_transforms),
batch_size=128,
shuffle=True,
)
data_dir = os.path.expanduser("~/data")
# We add FileLock here because multiple workers will want to
# download data, and this may cause overwrites since
# DataLoader is not threadsafe.
with FileLock(os.path.join(data_dir, ".lock")):
train_loader = torch.utils.data.DataLoader(
datasets.MNIST(
data_dir, train=True, download=True, transform=mnist_transforms
),
batch_size=128,
shuffle=True,
)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST(data_dir, train=False, transform=mnist_transforms),
batch_size=128,
shuffle=True,
)

# Check for pyproject.toml or setup.py
if "pyproject.toml" in contents:
pyproject_path = os.path.join(install_path, "pyproject.toml")
with open(pyproject_path, "r") as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's a good practice to explicitly specify the file encoding when opening text files. This avoids potential UnicodeDecodeError if the file contains non-ASCII characters and the system's default encoding is not UTF-8. Please add encoding="utf-8".

Suggested change
with open(pyproject_path, "r") as f:
with open(pyproject_path, "r", encoding="utf-8") as f:

k82cn added 2 commits January 24, 2026 01:23
Signed-off-by: Klaus Ma <klausm@nvidia.com>
Signed-off-by: Klaus Ma <klausm@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant